29 research outputs found
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
This paper describes a system developed for the GENEA (Generation and
Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our
solution builds on an existing diffusion-based motion synthesis model. We
propose a contrastive speech and motion pretraining (CSMP) module, which learns
a joint embedding for speech and gesture with the aim to learn a semantic
coupling between these modalities. The output of the CSMP module is used as a
conditioning signal in the diffusion-based gesture synthesis model in order to
achieve semantically-aware co-speech gesture generation. Our entry achieved
highest human-likeness and highest speech appropriateness rating among the
submitted entries. This indicates that our system is a promising approach to
achieve human-like co-speech gestures in agents that carry semantic meaning
Matcha-TTS: A fast TTS architecture with conditional flow matching
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS
acoustic modelling, trained using optimal-transport conditional flow matching
(OT-CFM). This yields an ODE-based decoder capable of high output quality in
fewer synthesis steps than models trained using score matching. Careful design
choices additionally ensure each synthesis step is fast to run. The method is
probabilistic, non-autoregressive, and learns to speak from scratch without
external alignments. Compared to strong pre-trained baseline models, the
Matcha-TTS system has the smallest memory footprint, rivals the speed of the
fastest models on long utterances, and attains the highest mean opinion score
in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for
audio examples, code, and pre-trained models.Comment: 5 pages, 3 figures. Submitted to ICASSP 202
Prosody-controllable spontaneous TTS with neural HMMs
Spontaneous speech has many affective and pragmatic functions that are
interesting and challenging to model in TTS (text-to-speech). However, the
presence of reduced articulation, fillers, repetitions, and other disfluencies
mean that text and acoustics are less well aligned than in read speech. This is
problematic for attention-based TTS. We propose a TTS architecture that is
particularly suited for rapidly learning to speak from irregular and small
datasets while also reproducing the diversity of expressive phenomena present
in spontaneous speech. Specifically, we modify an existing neural HMM-based TTS
system, which is capable of stable, monotonic alignments for spontaneous
speech, and add utterance-level prosody control, so that the system can
represent the wide range of natural variability in a spontaneous speech corpus.
We objectively evaluate control accuracy and perform a subjective listening
test to compare to a system without prosody control. To exemplify the power of
combining mid-level prosody control and ecologically valid data for reproducing
intricate spontaneous speech phenomena, we evaluate the system's capability of
synthesizing two types of creaky phonation. Audio samples are available at
https://hfkml.github.io/pc_nhmm_tts/Comment: 5 pages, 3 figures, Submitted to ICASSP 202
Staging Orthodontic Aligners for Complex Orthodontic Tooth Movement
The recent trend in orthodontics has shown an increased shift toward aligner therapy. For years, orthodontists have used fixed preadjusted appliances for orthodontic treatment. Even though fixed appliances have been highly efficient in the treatment of orthodontic malocclusions, they are not as esthetic as clear aligners. The purpose of this article is to review the staging of orthodontic tooth movement (OTM) with aligner therapy
ANTIFERTILITY ACTIVITY AND CONTRACEPTIVE POTENTIAL OF THE HYDROALCOHOLIC RHIZOME EXTRACT OF TRILLIUM GOVANIANUM IN FEMALE WISTAR RATS
Objective: Trillium govanianum is used in several traditional containing steroids and sex hormones for the management of inflammation, menstrual disorders, sex-related disorders, and antiseptic. The present study was aimed to investigate the antifertility potential of hydroalcoholic rhizome extract of T. govanianum and to explore the possible mechanism of action. Methods: Anti-implantation activity of T. govanianum rhizome extract (125 and 250 mg/kg; p.o.) was performed in female Wistar rats with proven fertility, and its estrogenic/antiestrogenic effect was evaluated in ovariectomized females. 17-α-ethinylestradiol (1 μg/rat/day; s.c.) or plant extract was administered for 11 days after which animals were sacrificed. Percentage inhibition of implantation sites, serum estrogen levels, changes in body and uterus weight, and morphological alterations in the uterus and ovaries were evaluated. Results: T. govanianum treatment resulted in increased uterus weight and induced dose-dependent anti-implantation effect, with 100% implantation inhibition at 250 mg/kg dose. Anti-implantation effects of T. govanianum were associated with endometrial thickening and significantly elevated serum estrogen levels. Moreover, estrogenic/antiestrogenic studies revealed that T. govanianum possessed strong estrogenic effect; however, the effect was saturable. Conclusion: T. govanianum possesses antifertility activity which can be attributed to its strong estrogenic potential and uterine thickening. Moreover, it could find a clinical application as a safer and efficacious birth control herbal remedy
Financial Literacy at WPI: An Investigation into the Current State and Recommendations for Educational Improvement
Financial literacy is important because it leads to better financial decision making, but has been severely lacking in college students due to poor or nonexistent high school education. Our goals were to find the current state of WPI student financial literacy and to determine whether past education has been effective, as well as finding the optimal educational method for WPI students. These goals were accomplished by surveying and interviewing WPI undergraduates, graduates, alumni, and professionals in the field of finance. We then synthesized all of our collected data and drew conclusions and offered recommendations for the improvement and maintenance of the current programs in order to foster a better financial literacy educational program
Design of a Stimuli Delivery System for Use in MRIs
Treating neurological issues requires an understanding of brain mechanisms which may be studied using functional neuroimaging of awake animal test subjects. The purpose of this project was to aid in this research by developing a system that could reliably and semi-quantitatively deliver airborne stimuli to test subjects undergoing MRI regimes. Up to 4 stimuli were reliably delivered through use of a compressor, tank, flow meter, pressure regulator, and solenoid valves, and odor strengths were adjusted by altering flow rates. After delivery, odorants were evacuated through a separate outlet, filtered, and released outside the MRI room. Future uses for the device include research into addiction and fear mitigation as well as commercial uses such as scent marketing and virtual reality
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
With read-aloud speech synthesis achieving high naturalness scores, there is
a growing research interest in synthesising spontaneous speech. However, human
spontaneous face-to-face conversation has both spoken and non-verbal aspects
(here, co-speech gestures). Only recently has research begun to explore the
benefits of jointly synthesising these two modalities in a single system. The
previous state of the art used non-probabilistic methods, which fail to capture
the variability of human speech and motion, and risk producing oversmoothing
artefacts and sub-optimal synthesis quality. We present the first
diffusion-based probabilistic model, called Diff-TTSG, that jointly learns to
synthesise speech and gestures together. Our method can be trained on small
datasets from scratch. Furthermore, we describe a set of careful uni- and
multi-modal subjective tests for evaluating integrated speech and gesture
synthesis systems, and use them to validate our proposed approach. For
synthesised examples please see https://shivammehta25.github.io/Diff-TTSGComment: 7 pages, 2 figures, Accepted at ISCA Speech Synthesis Workshop (SSW)
202
OverFlow: Putting flows on top of neural transducers for better TTS
Neural HMMs are a type of neural transducer recently proposed for
sequence-to-sequence modelling in text-to-speech. They combine the best
features of classic statistical speech synthesis and modern neural TTS,
requiring less data and fewer training updates, and are less prone to gibberish
output caused by neural attention failures. In this paper, we combine neural
HMM TTS with normalising flows for describing the highly non-Gaussian
distribution of speech acoustics. The result is a powerful, fully probabilistic
model of durations and acoustics that can be trained using exact maximum
likelihood. Experiments show that a system based on our proposal needs fewer
updates than comparable methods to produce accurate pronunciations and a
subjective speech quality close to natural speech. Please see
https://shivammehta25.github.io/OverFlow/ for audio examples and code.Comment: 5 pages, 2 figures. Accepted for publication at Interspeech 202